Digital Forensic Architecture for Cloud Computing Systems: Methods of Evidence Identification, Segregation, Collection and Partial Analysis_Private Investigation & Detective Services China | private police

China Investigation

Service Search

Position:Home > News > Company dynamics >

Digital Forensic Architecture for Cloud Computing Systems: Methods of Evidence Identification, Segregation, Collection and Partial Analysis

Over the past few years, cloud computing is maturing and has revolutionized the methods by which digital information is stored, transmitted, and processed. Cloud computing is not just a hyped model but embraced by Information Technology giants such as Apple, Amazon, Microsoft, Google, Oracle, IBM, HP, and others. Cloud computing has major concerns due to its architecture despite the technological innovations that have made it a feasible solution. It is difficult to fix the responsibility of a security breach in cloud due to the complex structure of the cloud services. In last decade, cloud computing security seems to be the most frequently surveyed topic among others by leading organizations such as IDC (International Data Corporation) and Gartner.

Recent attacks on cloud such as Sony Email hack, Apple iCloud hack, etc., proved the vulnerability in cloud platforms and require immediate attention for digital forensics in cloud computing environment. Cloud Security Alliance (CSA) conducted a survey related to the issues of forensic investigation in the cloud computing environments [1]. The survey document summarizes the international standards for cloud forensics, and integration of the requirements of cloud forensic into service level agreements (SLAs). In June 2014, NIST (National Institute of Standards and Technology) established a working group called NIST Cloud Computing Forensic Science Working Group (NCC FSWG) to research challenges in performing digital forensics in cloud computing platform. This group aims to provide standards and technology research in the field of cloud forensics that cannot be handled with current technology and methods [2]. The NIST document lists all the challenges along with preliminary analysis of each challenge by providing associated literature and relationship to the five essential characteristics of cloud. Our work focuses on some of the issues and/or challenges pointed out in the above two documents [1, 2].

The remaining part of the paper is organized as follows: In Sect. 1 we discuss an introduction cloud forensics. Section 2 provides the details of literature review. Section 3 emphasizes on the digital forensics architecture for cloud computing. Section 4 lists the methods of evidence source identification, segregation and acquisition. Section 5 describes the techniques for partial analysis of evidence related to a user account in private cloud. Finally we conclude the work and discuss future enhancements in Sect. 6.

1.1 Cloud Forensics

For traditional digital forensics, there are well-known commercial and open source tools available for performing forensic analysis [3, 4, 5, 6, 7, 8]. These tools may help in performing forensics in virtual environment (virtual disk forensics) to some extent, but may fail to complete the forensic investigation process (i.e., from evidence identification to reporting) in the cloud; particularly cloud log analysis. Ruan et al., have conducted a survey on “Cloud Forensics and Critical Criteria for Cloud Forensic Capability” in 2011 to define cloud forensics, to identify cloud forensics challenges, to find research directions etc. as the major issues for the survey. The majority of the experts involved in the survey agreed on the definition “Cloud forensics is a subset of network forensics” [9]. Network forensics deals with forensic investigations of computer networks. Also, cloud forensics was defined by other researchers such as Shams et al. in 2013 as “the application of computer forensic principles and procedures in a cloud computing environment” [10]. The cloud deployment model under investigation (private, community, hybrid, or public) will define the way in which digital forensic investigation will be carried out. The work presented in this paper is restricted to IaaS model of private or public cloud infrastructure.

2 Literature Review

Dykstra et al. have used existing tools like Encase Enterprise, FTK, Fastdump, Memoryze, and FTK Imager to acquire forensic evidence from public cloud over the Internet. The aim of their research was to measure the effectiveness and accuracy of the traditional digital forensic tools on an entirely different environment like cloud. Their experiment showed that trust is required at many layers to acquire forensic evidence [11]. Also, they have implemented user-driven forensic capabilities using management plane of a private cloud platform called OpenStack [12]. The solution is capable of collecting virtual disks, guest firewall logs and API logs through the management plane of OpenStack [13]. Their emphasis was on data collection and segregation of log data in data centers using OpenStack as cloud platform. Hence, their solution is not independent of OpenStack platform and till date it has not been added to the public distribution (the latest stable version of OpenStack is Kilo released on 30th April 2015).

To our knowledge, there is no digital forensic solution (or toolkit) that can be used in the cloud platforms to collect the cloud data, to segregate the multi-tenant data, to perform the partial analysis on the collected data to minimize the overall processing time of cloud evidence. Inspired with the work of Dykstra and Sherman, we have contributed in designing the digital forensic architecture for cloud; implementing modules for data segregation and collection; implementing modules for partial analysis of evidence within (virtual hard disk, physical memory of a VM) and outside (cloud logs) of a virtual environment called cloud.

3 Digital Forensics Architecture for Cloud Computing

In this proposed research, we provide a digital forensic architecture for cloud computing platforms as shown in Fig. 1, which is based on the NIST Cloud Computing reference architecture [14] and Cloud Operating systems like OpenNebula, OpenStack, Eucalyptus, etc. Cloud Operating system mainly consists of services to manage and provide access to resources in the cloud. User access to cloud resources is restricted based on delivery models (IaaS, PaaS, or SaaS). Other than hardware and virtualization layer, user has access to all other layers in IaaS model, whereas restricted access to PaaS and SaaS as depicted in the architecture. Therefore, our contribution is restricted to IaaS model.

Cloud provider may have external auditing services for auditing security, auditing privacy, and auditing performance. Our goal is to provide forensic investigative services for data collection, hybrid data acquisition, and partial evidence analysis. As shown in the figure, admin of CSP (Cloud Service Provider) can make use of Forensic Investigative Services directly whereas cloud user and/or investigator have to depend on the cloud admin. The suggested digital forensic architecture for cloud computing systems is generic and can be used by any cloud deployment model.

3.1 Cloud Deployment (Cloud OS)

For experimental purpose, we have set up an IaaS (Infrastructure-as-a-Service) cloud test bed using the two-node architecture concept of the OpenStack. The conceptual architecture uses two network switches, one for internal communication between servers and among virtual machines and another for external communication as shown in the Fig. 2. The controller node runs required services of OpenStack and compute node runs the virtual machines. Any number of compute nodes can be added to this test bed depending on the requirements to create number of virtual machines.

Fig. 2
Conceptual architecture of the private cloud IaaS

4 Digital Evidence Source Identification, Segregation and Acquisition

4.1 Identification of Evidence

The virtual machine is as good as a physical machine and creates lots of data in the cloud for its activity and management. The data created by a virtual machine includes virtual hard disk, physical memory of the VM, and logs. Virtual hard disk formats that different cloud provider may support include .qcou2, .vhd, .vdi, .vmdk, .img, etc. Every cloud provider may have their own mechanism for service logs (activity maintenance information) and hence there is no interoperability on log formats among cloud providers. The virtual hard disk file will be available in the compute node where the corresponding virtual machine runs. Cloud logs will be spread across controller and compute nodes.

4.2 Evidence Segregation

Cloud computing platform is a multi-tenant environment where end users share cloud resources and log files that store cloud computing services activity. These log files cannot be provided to the investigator and/or cloud user for forensic activity due to the privacy issues of other users in the same environment. Dykstra and Sherman [12] have suggested a tree based data structure called “hash tree” to store API logs and firewall logs. Since we have not modified any of the OpenStack service modules, we have implemented a different approach of logging known as “shared table” database. In this approach, a script runs at the host server where the different services of the OpenStack are installed (for examples “nova service”). This script mines the data from all the log files and creates a database table. This database table contains the data of multi-tenants and the key to uniquely identify a record is “Instance ID” which is unique to a virtual machine. Now, cloud user and/or investigator with the help of cloud administrator can query the database for any specific information from a remote system as explained in Sect. 4.3.

4.3 Evidence Acquisition

We designed a generic architecture for cloud forensics and the solutions are tested in the private cloud deployment using OpenStack (but may scale to any deployment model). The tools that are designed and developed for data collection and partial analysis will run on the investigator’s workstation, whereas, the data segregation tool runs on the cloud hosting servers where the log files are stored. A generic view of the investigator’s interaction with the private cloud platform is shown in Fig. 3.

Fig. 3
Remote data acquisition in private cloud data center

Log data acquisition The segregated log data is collected using the investigator’s workstation, i.e., a computer device where the acquisition and partial analysis tools are deployed. We have created a MySQL database with the name logdb and a table servicelogsunder the database in the Controller node of OpenStack. The application screen shots for connecting to the database from investigator’s machine and viewing the table content are shown in Figs. 4 and 5 respectively. The investigator can go through the table content and form a query based on ATTRIBUTE, CONDITION (==, ! =, <, <=, >, >=), and VALUE to filter the evidence required and download to the investigator’s workstation if necessary as shown in Fig. 5.

Fig. 4
Connecting to cloud hosting server that stores the shared table database

Fig. 5
Shared table with different attribute information

5 Partial Analysis (or Examination) of Evidence

The evidence examination and analysis approaches of traditional digital forensics cannot be directly applicable to cloud data due to virtualization and multi-tenancy. There is a requirement of “digital forensic triage” to enable cybercrime investigator to understand whether the case is worthy enough for investigation. Digital forensic triage is a technique used in the selective data acquisition and analysis to minimize the processing time of digital evidence. We now present the methods of partial analysis (also called evidence examination) required for virtual machine data.

5.1 Within the Virtual Machine

Using the examination phase at the scene of crime at different parts of evidence, we provide the investigator with enough knowledge base of the file system metadata, content of logs (for example content of registry files in Windows), and internals of physical memory. With this knowledge base, the investigator will have in-depth understanding of the case under investigation and may save a considerable amount of valuable time which can be efficiently utilized for further analysis.

Examination of file system metadata Once the forensic image of the virtual hard disk is obtained in the investigator’s workstation, the examination of file system metadata or logs (for example registry file in Windows) will be started as shown in Fig. 6. Before using the system metadata extractor or OS log analyzer (for example windows registry analyzer), the investigator has to mount the acquired virtual disk as a virtual drive.

After mounting, the virtual disk behaves like a drive where it is mounted. System metadata extractor as shown in Fig. 7 is used to list the metadata information of files and folders available in the different partitions of the virtual hard disk. For example, a machine where NTFS is used as file system, we have extracted metadata information of files/folders like MFT record No., active/deleted, file/folder, filename, file creation date, file accessed date, etc. This report may differ for various file systems (FAT32, EXT3, HFS, etc.).

Examination of registry files Windows operating system stores configuration data in the registry which is most important for digital forensics. The registry is a hierarchical database, which can be described as a central repository for configuration data (i.e., it stores several elements of information including system details, application installation information, networks used, attached devices, history list, etc. [15]). Registry files are user specific and their location depends on the type of operating system (Windows 2000, XP, 7, 8, etc.). To get the specific information from registry the investigator needs to choose, mounted virtual drive, Operating system, User, and the element of information to be retrieved as shown in Fig. 8. Based on the specific items selected, a sample report will be generated in the plain text format.

Examination of physical memory Physical memory (or RAM, also called as Volatile memory), contains a wealth of information about the running state of the system like running and hidden processes, malicious injected code, list of open connections, command history, passwords, clipboard content, etc. We have used volatility 2.1 [16] plugins to capture some the important information from physical memory of the virtual machine as shown in Fig. 9.

Apart from the selective memory analysis, we have implemented multiple keywords search using Boyer-Moore [17] pattern matching algorithm. The investigator can enter keywords using double quotes separated by comma as shown in the Fig. 10. For searching patterns, we have implemented regular expression search for URL, Phone No., Email ID and IP address as shown in the Fig. 11.

Fig. 10
Multiple keywords search (indexing)

Fig. 11
Multiple pattern search (indexing)

6 Conclusion and Future Work

Adaptation of digital forensic techniques to the cloud environment is challenging in many ways. Cloud as a business model presents a range of new challenges to digital forensic investigators due to its unique characteristics. It is necessary that the forensic investigators and/or researchers adapt the existing traditional digital forensic practices and develop new forensic models, which would enable the investigators to perform digital forensics in cloud.

In our paper, we have designed a digital forensic architecture for the cloud computing systems which may be useful to the digital forensic community for designing and developing new forensic tools in the area of cloud forensic; we have framed ways in which we can do digital evidence source identification, segregation and acquisition of evidentiary data. In addition, we have formulated methods for examination of evidence within (virtual hard disk, physical memory of a VM) and outside (logs) of a virtual environment called cloud. The approach we suggested for segregation (log data) will facilitate a software client to support collection of cloud evidentiary data (forensic artifacts) without disrupting other tenants. To minimize the processing time of digital evidence, we proposed solutions for the initial forensic examination of virtual machine data (virtual hard disk, physical memory of a VM) in the places where the digital evidence artifacts are most likely to be present. As understanding the case under investigation is done in a better way, it saves considerable time, which can be efficiently utilized for further analysis. Hence, the investigation process may take less time than actually required. The mechanisms we developed were tested in the OpenStack cloud environment. In future, we plan to test the solutions in other platforms.

keywords

Huaxin International Investigation (HII)Investigation Services in China